A statistical procedure for the identification of positrons 
in the PAMELA experiment 



O. Adriani'^'^, G. C. Barbarino"-'^, G. A. Bazilevskaya^ R. Bellotti^'S'*, M. 
Boezio^, E. A. Bogomolov\ L. BonechP'*', M. Bongi^, V. Bonvicini'^, S. 
Borisov-'''^'', S. Bottai^, A. Bruno^'^, F. Cafagna^, D. Campana'^, R. 
Carbonej'^, R Carlson"^, M. Casolino'^, G. CastellinP, L. Consiglio'^, M. R 
De Rascale'''^, C. De Santis"^, N. De Simonej'"", V. Di FeliceJ''", A. M. Galper', 

W. Gillard'", L. Grishantseva\ P. Hofverberg°>, G. Jerse^'°, S. V. 
Koldashov', S. Y. Krutkov', A. N. Kvashnin'', A. Leonov^, V. Malvezzi"^, L. 
Marcelli'^, W. Menn?, V. V. Mikhailov', E. Mocchiutti^, A. Monaco^'§, N. 
Mori^, N. Nikonoyj''^'', G. Osteria'^, R Papini^, M. Pearce™, P. Picozzaj'"^, 
M. Ricci^, S. B. Ricciarini'^, L. Rossetto'", M. SimonP, R. SparvoliJ''', P. 
Spillantini'^'^, Y. I. Stozhkov'^, A. Vacchi*^, E. Vannuccini^, G. Vasilyev^ S. 
A. Voronov\ J. Wu™, Y. T. Yurkin\ G. Zampa^, N. Zampa'^, V. G. Zverev\ 

D. Marinucci^ 

" University of Florence, Department of Physics, Via Sansone 1, 1-50019 Sesto 

Fiorentino, Florence, Italy. 
^INFN, Sezione di Florence, Via Sansone 1, F50019 Sesto Fiorentino, Florence, Italy. 
'^University of Naples " Federico IF , Department of Physics, Via Cintia, 1-80126 Naples, 

Italy. 

"^INFN, Sezione di Naples, Via Cintia, 1-80126 Naples, Italy. 
"Lebedev Physical Institute, Leninsky Prospekt 53, RU-119991 Moscow, Russia, 
f University of Bari, Department of Physics, Via Amendola 173, 1-70126 Bari, Italy. 
3INFN, Sezione di Bari, Via Amendola 173, 1-70126 Bari, Italy. 
^INFN, Sezione di Trieste, Padriciano 99, 1-34012 Trieste, Italy, 
^loffe Physical Technical Institute, Polytekhnicheskaya 26, RU- 19^021 St. Petersburg, 

Russia. 

^University of Rome " Tor Vergatd\ Department of Physics, Via della Ricerca Scientifica 

1, F00133 Rome, Italy. 
^INFN, Sezione di Roma " Tor Vergata" , Via della Ricerca Scientifica 1, 1-00133 Rome, 

Italy. 

^Moscow Engineering and Physics Institute, Kashirskoe Shosse 31, RU-11540 Moscow, 

Russia. 

"^KTH, Department of Physics, and the Oskar Klein Centre for Cosmoparticle Physics, 
AlhaNova University Centre, 10691 Stockholm, Sweden. 



Corresponding author. Tel: +390805443173 

Email address: roberto.bellotti@ba.infn.it (R. Bellotti) 



Preprint submitted to Astroparticle Physics 



January 20, 2010 



^IFAC, Via Madonna del Piano 10, 1-50019 Sesto Fiorentino, Florence, Italy. 
'University of Trieste, Department of Physics, Via A. Valeria 2, 1-34147 Trieste, Italy . 

P University of Siegen, D-57068 Siegen, Germany. 
'^INFN, Laboratori Nazionali di Frascati, Via Enrico Fermi 40, 1-00044 Frascati, Italy. 
^University of Rome "Tor Vergatd', Department of Mathematics, Via della Ricerca 
Scientifica 1, 1-00133 Rome, Italy. 



Abstract 

The PAMELA satellite experiment has measured the cosmic-ray positron 
fraction between 1.5 GeV and 100 GeV. The need to reliably discriminate 
between the positron signal and proton background has required the devel- 
opment of an ad hoc analysis procedure. In this paper, a method for positron 
identification is described and its stability and capability to yield a correct 
background estimate is shown. The analysis includes new experimental data, 
the application of three different fitting techniques for the background sam- 
ple and an estimate of systematic uncertainties due to possible inaccuracies 
in the background selection. The new experimental results confirm both so- 
lar modulation effects on cosmic-rays with low rigidities and an anomalous 
positron abundance above 10 GeV. 
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1. Introduction 



Recent measurements of cosmic-ray electrons and positrons carried out 



and HESS experiments 



9|, llO|, and satellite [11 



, extend 
measure- 



by the ATIC [1[, PAMELA [21 FERMI 
the previous balloon-borne [a, [g], 0, 
ments and represent a breakthrough in cosmic-ray physics. In particular it 
is well known that an antimatter component that cannot be explained as 
an effect of a purely secondary production mechanism, could provide insight 
into the nature and distribution of particle sources in our galaxy 12|. The 



PAMELA experiment has reported a measurement of the positron fraction, 
i.e. the ratio of positron flux to the sum of electron and positron fluxes, 
R = 0(6"*") /((^(e~) + (^(e+)), at energies between 1.5 GeV and 100 GeV, sam- 
pled in 16 energy bins. The observations extend the energy range of previous 
positron measurements and unambiguosly show an anomalous positron abun- 
dance above 10 GeV. A reliable identification of electrons and positrons has 
been performed by combining iformation from independent detectors within 
the apparatus [2I, [isj . The main difficulty in the measurement of R is the 
dominating background flux from protons which is 10^ (at 1 GV) and 10^ 
(at 100 GV), times the positron flux. Furthermore, a precise estimate of the 
proton contamination in the positron sample is a difficult task. 

A widely adopted approach both in high energy physics and astrophysics, 
consists of an intensive use of simulated signal and background samples to 
train different multivariate classifiers, such as artificial neural networks and 
support vector machines [l5|, [l6j . It has been demonstrated that such a ap- 



proach can improve background rejection in the signal sample [17|, ll8|. How- 



ever this approach can introduce systematic uncertainties which are difficult 
to estimate, for the real data. 

In this paper wepresent a method used to obtain an updated the PAMELA 
positron fraction |2| and further statistical procedures, based on wavelet 
and kernel estimates, in order to estimate the proton contamination in the 
positron sample. Although our approach is based on well known statisti- 
cal techniques, we believe this methodology can be of interest because the 
data analysis is mainly based on the discrimination capabilities of a single 
detector, i.e. the electromagnetic calorimeter. Previously published results 
^ refer to data collected by the experiment between July 2006 and February 
2008. Here, we present the methodology applied to larger data set collected 
between July 2006 and December 2008. 

In Section 2 the PAMELA experiment is briefiy described. A detailed 
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description of the apparatus can be found in |19j. In Section 3 the discrim- 
inating variables used for the analysis are presented. In Section 4 the event 
selection procedure is described: this is the first phase of the analysis and it 
involves all the detectors of the PAMELA apparatus. The core of the anal- 
ysis is described in Section 5. The methodologies developed to estimate the 
positron fraction R and the statistical and systematic uncertainties are illus- 
trated and applied to the PAMELA data. A summary of the experimental 
results and the conclusions are presented in Section 6. 

2. The PAMELA apparatus 

As shown in FiglH the PAMELA apparatus is composed by the following 
detectors (from top to bottom): 

1. a time-of-flight system (ToF (SI, S2, S3)); 

2. a magnetic spectrometer; 

3. an anticoincidence system (AC (CARD, CAT, CAS)); 

4. an electromagnetic imaging calorimeter; 

5. a shower tail catcher scintillator (S4) and 

6. a neutron detector. 

The ToF system provides a fast signal for triggering the data acquisi- 
tion and measures the time-of-flight and ionization energy losses (dE/dx) 
of traversing particles. It also allows down-going particles to be reliably 
identified. Multiple tracks, produced in interactions above the spectrome- 
ter, are rejected by requiring that only one strip of the top ToF scintillator 
(SI and S2) layers register an energy deposition (hit). Similarly no hits were 
permitted in either top scintillators of the AC system (CARD and CAT). 
The magnetic spectrometer consists of a 0.43 T permanent magnet and a 
silicon microstrip tracking system. It measures the rigidity of charged parti- 
cles through their deflection in the magnetic field. During flight the spatial 
resolution is observed to be 3 fim in the bending view, corresponding to a 
Maximum Detectable Rigidity (MDR), defined as a 100% uncertainty in the 
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rigidity determination, exceeding 1 TV. The dE/dx losses measured in SI 
and the sihcon layers of the magnetic spectrometer were used to select mini- 
mum ionizing singly charged particles (mip) by requiring the measured dE/dx 
to be less than twice that expected from a mip. The sampling calorimeter 
comprises 44 silicon sensor planes interleaved with 22 plates of tungsten ab- 
sorber. Each tungsten layer has a thickness of 0.26 cm corresponding to 
0.74 radiation lengths. A high dynamic-range scintillator system (S4) and 
a neutron detector are mounted under the calorimeter. The apparatus is 
approximately 130 cm tall and with a mass of about 470 kg and it is inserted 
inside a pressurized container attached to the Russian Resurs-DKl satellite 
3. 




Figure 1: A schematic overview of the PAMELA satellite experinient. The experiment 
stands ^ 1.3 m high and, from top to bottom, consists of a time-of-flight (ToF) system 
(SI, S2, S3 scintillator planes), an anticoincidence shield system, a permanent magnet 
spectrometer (the magnetic field runs in the y-direction) , a silicon-tungsten electromag- 
netic calorimeter, a shower tail scintillator (S4) and a neutron detector. The experiment 
has an overall mass of 470 kg. 
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2.1. The imaging calorimeter 

In this analysis the PAMELA sihcon-tungsten samphng imaging calorime- 
ter 



20[ plays a key role, due to its capability to give an accurate topological 
description of the showers generated by the interaction of the cosmic-ray 
particles. 

Electromagnetic calorimeters have been widely used for particle discrimi- 
nation in balloon-borne cosmic-ray experiments S, H [211, . The PAMELA 
an imaging calorimeter is evolution of the instrument used in several balloon- 



borne experiments [22, |23|, |6| and its performances have been throughly in 



vestigated by means of test beam data and Monte Carlo simulations [20 
It is 16.3 radiation lengths (0.6 nuclear interaction lengths) deep, so both 
electrons and positrons develop a well-contained electromagnetic shower in 
the energy range of interest. In contrast, the majority of the protons will ei- 
ther pass through the calorimeter as a minimum ionizing particle or interact 
deeply in the calorimeter. In fact there is a high probability (>89%) that an 
electromagnetic shower will start in the first 3 planes of the calorimeter. For 
hadronic showers, the starting point is distributed more uniformly. Particle 
identification based on the total measured energy and the starting point of 
the reconstructed shower in the calorimeter can be tuned to reject 99.9% of 
the protons, while selecting more than 95% of the electrons or positrons. The 
remaining proton contamination in the positron sample can be eliminated us- 
ing additional topological information, including the lateral and longitudinal 
profile of the shower. Using particle beam data collected at CERN it was pre- 
viously shown that less than one proton out of 100, 000 passes the calorimeter 
electron selection up to 200 GeV/c, with a corresponding electron selection 
efficiency of 80% 



3. Discriminating variable selection 

The misidentification of electrons and protons are the largest sources of 
background when estimating the positron fraction. This can occur if the sign- 
of-charge is incorrectly assigned from the spectrometer data, or if electron- 
and proton-like interaction patterns are confused in the calorimeter data. 
The proton-to-positron flux ratio increases from approximately 10^ at 1 GeV 
to approximately 10^ at 100 GeV and represents the major source of contam- 
ination. Robust positron identiflcation is therefore required and the residual 
proton background must be carefully assessed. To do this a single discrimi- 
nating variable is considered: the fraction J-" of calorimeter energy deposited 



6 



inside a cylinder of radius 0.3 Moliere radii. Fig. [2] shows J-" as a function of 
deflection (rigidity"^). The axis of the cylinder is defined by extrapolating 
the particle track reconstructed in the spectrometer. The Moliere radius is 
an important quantity in calorimetry as it quantifies the lateral spread of 
an electromagnetic shower (about 90% of the shower energy is contained in 
a cylinder with a radius equal to 1 Moliere radius), and depends only on 
the absorbing material (tungsten in this case). The events shown in Fig. [2] 
were selected requiring a match between the momentum measured by the 
tracking system and the total detected energy and the starting point of the 
shower in the calorimeter. For negatively-signed deflections, electrons are 
clearly visible as a horizontal band with T lying mostly between 0.4 and 
0.7. For positively-signed deflections, the similar horizontal band is natu- 
rally associated to positrons, with the remaining points, mostly at J-" < 0.4, 
designated as proton contamination. The validity of such event characteri- 
zation was confirmed using the neutron yield from the calorimeter and the 
ionization [dEjdx) losses measured in the spectrometer pi|]. The spillover 
limit for positrons is estimated from particle beam tests to be approximately 
300 GeV. From particle beam tests the spillover limit for positions is esti- 
mated to be approximately 300 GeV, primarily due to the tracker resolution. 
The electron spillover background between 1.5 and 100 GeV is negligible. 

4. Event selection 

While the distribution shown in Fig. |2]presents a clear positron signature, 
the residual proton background distribution must be quantified. It is worth- 
while to note that the background distribution was obtained using the flight 
calorimeter data and there was no dependence on simulations. In order to 
build a background model, the total calorimeter depth of 22 detector planes 
was divided in two non-mutually exclusive parts: an upper part comprising 
planes 1 — 20, and a lower part comprising planes 3 — 22. The positron com- 
ponent in positively charged events can be significantly reduced by selecting 
particles that do not interact in the first 2 planes because only 2% of elec- 
trons and positrons with rigidities greater than 1.5 GV pass this condition. 
This requirement selects a nearly pure sample of protons entering the lower 
part of the calorimeter (planes 3 — 22). The event selection methodology 
was further validated using particle beam data collected prior to lunch and 
data generated using the PAMELA Collaboration's official simulation pro- 
gram. This simulation is based on the GEANT package |2^ version 3.21 and 
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Figure 2: Calorimeter energy fraction T . The fraction of calorimeter energy deposited 
inside a cylinder of radius 0.3 Moliere radii, as a function of deflection. The number of 
events per bin is shown in different colours, as indicated in the colour scale. The axis of the 
cylinder is defined by extrapolating the particle track reconstructed by the spectrometer. 
The events were selected requiring a match between the momentum measured by the 
tracking system and the total detected energy and requiring that the electromagnetic 
shower starts developing in the first planes of the calorimeter. 



reproduces the entire PAMELA apparatus. 

Calorimeter variables (e.g. total detected energy, and lateral shower 
spread) were evaluated for the upper and lower parts of the calorimeter. 
Electrons and positrons were identified in the upper part of the calorimeter 
using the total detected energy and the starting point of the shower. As an 
example Fig. |3] shows the energy fraction J-", for negatively charged parti- 
cles in the rigidity range 28 — 42 GV selected as electrons in the upper half 
of the calorimeter (panel a). Panels (b) and (c) show the T distributions 
for positively-charged particles obtained for the lower (upper) part of the 
calorimeter, i.e. protons (protons and positrons). The distributions in pan- 
els (a) and (b) are clearly different while panel (c) shows a mixture of the 
two distributions, which strongly supports the positron interpretation for the 
electron-like T distribution in the sample of positively charged events. 
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5. Positron/proton discrimination 

As a result of the event selection described in the last section we obtained 
the distributions of pure electrons, pure protons and a mixture of positrons 
and protons, as shown in Fig. [31 Starting from these distributions the deter- 
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Figure 3: Calorimeter energy fraction J^: 28 — 42 GV. Panel (a) shows the distribution of 
the energy fraction for negatively charged particles, selected as electrons in the upper part 
of the calorimeter. Panel (b) shows the same distribution for positively charged particles 
selected as protons in the bottom part of the calorimeter. Panel (c) shows positively 
charged particles, selected in the upper part of the calorimeter, i.e. protons and positrons. 

mination of the ratio R, with the statistical and systematic error estimates, 
consists of four main steps, as summarised in Fig. HI 

1. estimation of the probability density functions (pdf) for the experimen- 
tal distributions shown in Fig. |3l 

2. construction of a finite mixture density of probability; 
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Figure 4: Flowchart of the methodology, based on different fitting and bootstrap techniques 
developed to evaluate the positron fraction R. 



3. estimation of the weight of the mixture by means of the maximum hke- 
hhood indicator; 

4. estimation of the statistical errors by means of a bootstrap procedure. 



In addition, an estimate of the systematic uncertainties due to inaccuracies 
in the background identification is performed. 

5.1. Pd] estimate 

The proton experimental distributions provide information about the 
background yields. In order to evaluate these distributions and to check 
possible systematic errors in this phase of the analysis, three different meth- 
ods have been implemented: beta, wavelets and kernel. 
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5.1.1. Beta pdf 

Since the discriminating variable used for the analysis is the energy frac- 
tion J-", spanning the interval [0, 1], the rational choice is to fit the experimen- 
tal distribution with a function spanning in the same interval and with few 
free parameters, in order to avoid unphysical modeling of the experimental 



data. We used the beta function 28 



where p > 0, q > 0, and /3{p, q) is 



f{x) = —^xP-\l - xy-\ (1) 



(3{p,q) = / xP-\l - xy-^dx. (2) 







This density has been used to fit both electrons and protons. A set of 
parameters for each rigidity bins is obtained and used for the subsequent 
steps of the analysis. 




Figure 5: Distribution of the energy fraction for positively charged particles selected as 
protons for 3 different rigidity bins with a fit to a beta pdf. 
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The mean of the beta pdf is: 



X = (3) 
p + q ^ ' 



and its variance is: 



= 7 7^ (4) 

{p + qY + {p + q + l) ^ ' 

5.1.2. Wavelets 

In the previous section, the distribution of the energy fraction was fitted 
by means of a fixed family, i.e. we fitted a parametric law in the statistical 
jargon. More precisely, we assumed a priori that the experimental distribu- 
tions we observed should be generated according to a specific (beta) law in 
[0, 1]. Although the choice of a beta function is natural for random variables 
in this range, it is important to question how much our final results depend 
on this assumption, i.e. their degree of robustness when varying the energy 
fraction distribution over a much greater range of possibilities. Our goal here 
is to explore the possibility of a nonparametric fit, where there is no a priori 
assumption on the energy fraction distribution. 

Over the last fifteen years, the statistical literature has focussed on the esti- 
mation of density functions in this broader nonparametric setting. We refer 



for instance to [31| for an introduction to this area of research. A wide con- 
sensus has formed on the role of wavelet based methods as the most powerful 
statistical techniques for nonparametric density estimation. 

A wavelet system is essentially an orthonormal basis which is constructed 
by dilations and translations of a mother and father function, leading to a 
multiresolution scheme. The wavelets we are going to implement are those 
proposed by Daubechies, which are computationally convenient as their sup- 
port in the real domain is limited. More explicitly, the father wavelet satisfies 

i^{x) = V2^hk^{2x - k) (5) 

k 

where are suitably chosen weights ([ss']), whereas the mother wavelet saX- 
isfies: 

ij{x) = V2Y,{-^f^^hi.k^{2x - k) (6) 
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Figure 6: Distribution of the energy fraction for positively charged particles selected as 
protons for 3 different rigidity bins with a wavelets fit. 



The multiresolution expansion of a function / is then provided by: 

fi^) = ^ (^kVk{x) + ^ Pjki^jkix) (7) 

k j,k 

where ak and f3jk are approximation and detail coefficients, respectively, and 
the elements of the basis are constructed as 

tP.kix) = Vl^{Tx -k), j,k = 1, 2, ... (8) 

In practice the coefficients ak and Pjk are unknown and must be estimated 
from the data. Given independent identically distributed random 

variables with an unknown density / on M, suitable estimators are provided 
by 

^ n 1 

ttfc = - ^<fk{xi), (3jk = - ^ipjk{xi). (9) 
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These estimators can be viewed as convolutions of the empirical his- 
tograms of the observations with the elements of the wavelets basis. At 
this stage, an obvious estimator may be proposed, by simply replacing the 
coefficients a, /3 in ([7]) by their sample estimates. This approach - the so 
called linear wavelet estimator - has however been shown to be suboptimal 



in general ([3l| or 32|). On the other hand, a wide consensus has emerged in 
the mathematical statistics community on the use of so-called wavelet thresh- 
olding techniques. Here, small coefficients are suppressed by introducing a 
threshold. In particular, in this paper a hard thresholding rule is used. In 
this case, the estimator for / is defined by (sif : 

fn{x) = ak^Pkix) + Pfk'^jkix) (10) 

k j,k 

where the coefficients are defined by: 

pfk = hn\M>t) (11) 

(the indicator function is defined as usual, e.g. /(|X| > t) = 1 if |X| > t , 
otherwise). The threshold level is chosen to be 



(12, 

where c > is a suitably chosen constant, and n is the number of observa- 
tions in our sample. Intuitively, the rationale behind these techniques can 
be explained as follows. The smaller sample coefficients can be expected 
to be largely dominated by noise, so dropping them will improve the global 
performance of the estimates. These argument can be made rigorous, in par- 
ticular it can be shown that wavelet thresholding estimators yield basically 
the optimal rate of convergence over a wide variety of loss functions, i.e., 
they (nearly) minimize over a wide class of functions / and norms the 
maximum risk 



M(A,/)=max(||/„-/r^,) (13) 

In practice, this means wavelet thresholding techniques enjoy robustness 
properties which are important in our context. They are sensitive at the 
same time to large scale features of the unknown energy distribution, and 
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they are also expected to detect the possible existence of small scale effects, 
such as local density spikes which could affect the final result. We refer again 



to |31| and [32| for further details and discussion. 

To fit the proton sample, the wavelet thresholding technique with the 
Daubechies' basis (in particular db3) has been used. A critical step has been 
to find the best value for the parameter c in f[T^ . We started fixing c = 3, 
which is often recommended as rule-of-the-thumb choice. We further verified 
by numerical experiments that our results are very stable for a wide range 
of fluctuations around this value. The positron-to-electron ratio estimates 
and corresponding confidence intervals are very close (indeed, in some cases 
nearly undistinguishable) from those obtained with the parametric fit of the 
beta distribution. 

5.1.3. Kernel estimate 

The kernel estimate is a statistical technique used to obtain an unbinned 
and nonparametric estimate of the probability density function. In the uni- 
variate case, the general kernel estimate of the parent distribution is given 
by H: 

i=l 

where Xj represents the data and h is the smoothing parameter (also called 
the bandwidth). It is important to note that f{x) is bin-independent regard- 
less of choice of K. K has the role to distribute the contribution of each data 
point in the evaluation of the probability density function. Istead h have the 
task to set the scale of kernel. 

Since the discriminating variable is defined in the bounded interval [0, 1] 



a beta kernel has been used [38[. The beta kernel is a non- negative kernel 
and it is usually considered to estimate probability density functions with 
compact supports. 

The number of beta functions generated was equal to the number of bins 
of the histogram and each beta function had a mean equal to the center of 
histogram bins. The standard deviation of these functions has been chosen 
through an application of the Kolmogorov-Smirnov test so that the initial 
distribution of protons and the modified one were statistically compatible, 
thereby rejecting the null hypothesis at 5% level. 

The beta parameters (p, g) have been calculated inverting (j3]) and ([4]). 
The kernel bandwith is assumed to be the histogram bin. The number of 
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Figure 7: The distribution of positively charged particles selected as protons (grey) for 
the rigidity bin 28 — 42 GV and the same distribution modified using the kernel method 
(dark-grey). 

events in each histogram of protons has been increased six fold compared to 
the original histogram. Fig. [7] shows the real protons and the pseudo-proton 
set for rigidity between 28 GV and 42 GV. Each pseudo-proton sample is then 
analyzed in the same way as the real protons (in particular with wavelets-fit), 
obtaining, for each energy bin, a new positron fraction. 

5.2. Finite mixture density 

A finite mixture of distributions is used for modelling dataset extracted 
from not homogeneous population. It is useful to analize a sample drawn from 
an unknown mixture of known distributions. In the procedure of the finite 
mixture distributions an experimental distribution may be approximated as 
a linear combination of probability distribution functions (pdfs) (26| : 

n 

9{x,p) = ^Ptfi{x) (15) 

i=l 
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where g{x,p) is the pdf to estimate, fi{x) are known pdfs, n is the number of 
pdfs, Pi are the mixing proportions {0 < Pi < 1 and XliLi Pi ~ ^) estimate. 

In the present analysis we model, for each energy interval, the distribu- 
tion of the calorimeter energy fraction (J-") for positively-charged particles as 



mixture distribution [26j of the positrons and protons pdfs: 

g{J')=pMJ') + il-p)fsiT) (16) 

where fb{J^) and fs{J^) are the probability density functions for protons and 
electrons, respectively and the pdfs and fs have been determinated in the 
previous section. As a result of this phase of the analysis a set of unknown 
weights pj, with j = 1,...,16, is obtained. 

5.3. Maximum Likelihood 

In order to find the values of unknown weights pj we used the well know 
maximum likelihood method. In the present case the likelihood function, (for 
each rigidity bin), is 



m 

= n [p^fbiJ't) + (1 - pj)fs{j't)] . (17) 



t=i 

where m is the number of independent observations Xi,X2, ■■■,Xm in each 
rigidity bin. 

The estimation of the parameters pj is done by maximizing the natural 
logarithm of (fT7|) : 

dlriLi , . 

— ^ = 0. 18 

dpj 

As a result of the three steps of the analysis, three different weights of 
the mixture for each energy bin are obtained. Using the beta functions, 
the wavelets transform and the kernel technique. In the next section the 
bootstrap technique is introduced. It has been used to evaluate both the 
positron fraction R and the statistical errors of the measurements. 

5.4- Statistical error estimates by means of the bootstrap technique 

The Bootstrap is a powerful method for analyzing small expensive-to- 
collect data sets where prior information is sparse 3^. In this method, 
a set of data is randomly resampled many times with replacement. Then 
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Figure 8: The distribution of positively charged particles for the rigidity bin 28-42 GV 
showing 3 pdf fits. 



statistical indicators, such as the standard error or the confidence interval, 
are evaluated from these new samples [35.]. 

This procedure has been used to estimate the statistical error on the ratio 

R. 

Each experimental distributions for electrons, protons and positively- 
charged particles have been resampled 1000 times, then the three steps of 
the analysis procedure previously described have been repeated. For each 
rigidity bin, a statistical distribution of the ratio R is thereby obtained. 

As a first step, M = 1000 bootstrap resampling of positives sample were 
applied. For each re-sample i the unknown parameter pi was estimated 
by means of an un binned maximum likelihood analysis. As a second step 
the procedure has been repeated = 1000 times applying A^ bootstrap 
resampling of electron and proton sample. So A^ x M estimations of the 
number of positron candidates have been obtained. Then, the final number 
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Rigidity at 


Percent error 


Percent error 


Percent error 


spectrometer (GV) 


(beta) 


(wavelets) 


(kernel with wavelets) 


1.5 - 1.8 


3.2% 


2.6% 


2.6% 


1.8-2.2 


2.6% 


2.9% 


2.6% 


2.2 - 2.7 


2.7% 


2.6% 


2.6% 


2.7-3.3 


2.9% 


3.1% 


3.1% 


3.3-4.1 


3.1% 


3.9% 


3.9% 


4.1-5.0 


3.6% 


3.8% 


4.3% 


5.0-6.1 


3.9% 


5.7% 


5.3% 


6.1 - 7.4 


4.7% 


4.8% 


4.4% 


7.4-9.1 


4.9% 


4.9% 


5.0% 


9.1 - 11.2 


4.7% 


5.7% 


5.9% 


11.2 - 15.0 


5.3% 


5.0% 


5.6% 


15.0-20.0 


6.1% 


5.4% 


6.3% 


20.0-28.0 


8.1% 


7.5% 


8.2% 


28.0-42.0 


10.1% 


9.5% 


11.2% 


42.0-65.0 


13.4% 


12.4% 


13.0% 


65.0- 100.0 


25% 


29.5% 


25.3% 



Table 1: Statistical errors on the positron fraction R for all rigidity bins. 



of positron candidates was obtained as: 

N M 

-=]7Bm5:-.«) (19) 

i=i j=i 

where nji in the number of positron candidates evaluated by each bootstrap 
iteration. Therefore also N x M estimations of positron fraction have been 
obtained. In the present analysis we used the range from the 16th and 84th 
percentiles of these distributions as the statistical error estimates of the ratio 
R. As shown in Table [1] the statistical errors on the points range between 
3% and 10% in all bins but the last two and then increase to just under 
30% in the highest energy bin. Fig. |9] shows three new estimates of the 
positron fraction, using the different fitting techniques adopted in this study. 
Moreover, as shown in Tab. [2], the results obtained with the three different 
background pdfs are consistent with each other. 
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Figure 9: The positron fraction R obtained using the wavelets-fit (blue), beta- fit (red) and 
kernel with wavelets- fit (green) . 
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5.5. Systematic uncertainties due to inaccuracies in the background selection 

The main sources of systematic uncertainties in the determination of the 
positron fraction are investigated in the following. Due to the equivalence of 
the results obtained in the previous section with the three different pdfs, the 
evaluation of the systematic uncertainties has been performed using only the 
beta fit. 

This is done by introducing a modification in the background distribution 
using the weighted bootstrap technique. This particular technique consists 



of positive weights applied to each observation of the dataset [39|. For each 
rigidity bin, starting from a proton sample with mean x, two new 

samples, n~^{J^) and n~{J^), are generated: 

1. n+(J-') with mean x+ > x; 

2. n^{J^) with mean x^ < x. 



The bootstrap weights are chosen in order to have both n^{J^) and n''{J^) 
statistically incompatible with n{J^), according to the Kolmogorov-Smirnov 
test. 
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Rigidity at Mean kinetic energy Extrapolated ^^^.j^^^^,,.^ Extrapolated ^^^-y^^^^^+-^ Extrapolated ^^^J-y^^^^^+■ 

spectrometer (GV) at top of payload at top of payload at top of payload at top of payload with 







beta/wavelets/kernel (GeV) 


with beta-fit 


with wavelets-fit 


kernel with wavelets-fit 


1.5 


— 1.8 


1.65 / 1.65 / 1.65 


0.0639Ioooi7 


n nc'7Q+0.0021 

0.0673Io.oo2i 


n nc'7n+0.0017 

0.0670Io.ooi7 


1 O 

1.8 


o o 
— Z.Z 


1.99 / 1.99 / 1.99 


rt ncni +0.0015 

0.0591Ioooi5 


n n«i O+0.0018 
O.O6I8I0.00I8 


rt na-i 0+O.OOI6 
O.O6I8I0.00I6 


2.2 


— 1.1 


^ A A 1 ^ A A 1 A A 

2A4: / ZAA / Z.44 


r\ nK« /I +0.0015 

0.0564^00015 


n nc;no+0.0017 

0.0598To.ooi4 


n r\KO'7+0.0016 

0.0587lo.ooi4 


9 7 


— o.o 


9 QQ / 9 QQ / 9 QQ 
z.yy / z.yy / z.yy 


U.U£)4y_o 0016 


0^^40+0 0016 
U.U£)4U_o.ooi7 


0^4fi+0-00i8 

U.U04D_oooi5 


3.3 


-4.1 


3.67 / 3.68 / 3.68 




0^16+0 0019 
U.UOiD_o.o021 


0.0508l°:S 


4.1 


-5.0 


4.49 / 4.51 / 4.52 


n 0545+0 0020 


0524+° °°^° 

U.UOZ^_00020 


51 5+0 0022 
u.uoij_o 0022 


5.0 


- 6.1 


5.68 / 5.38 / 5.49 


0602+°-°°^'^ 

U.UDUZ_o 0024 


0S20+°-°°3^ 

U.UdZU_o 0029 


U.UdOd_o 0028 


6.1 


- 7.4 


6.78 / 6.80 / 7.02 


0^22+0-0024 

U.UdZZ_o 0024 


0^00+0 0024 

U.UOUU_o 0025 


0492+°-°°2^ 
u.u'±yz_o 0022 


7.4 


- 9.1 


8.27 / 8.28 / 8.30 


05 76+0-0028 

U.UO^D_o.o028 


0504+0 0027 

U.UOU^_0 0027 


5 43+0-0027 

U.U040_o 0027 


9.1 - 


- 11.2 


10.16 / 10.17 /10.18 


0^^70+0-0033 

U.UOrU_o.o033 


0541 

U.UO^±_0.0031 


051 8+0-0029 
U.U£)iO_o.o032 


11.2 


- 15.0 


13.11 / 13.12 / 13.13 


0611+°-°°-'^2 

U.UUl±_o 0033 




5 9 5+0-0031 

U.UOyO_o.o035 


15.0 


- 20.0 


17.50 / 17.51 / 17.51 


0630+0-0039 

U.UD<3U_o 0039 


0.0628lHSi 


0590+°-'^'^^^ 

U.UOyU_o.o036 


20.0 


-28.0 


23.99 / 24.00 / 24.01 


064S+0-0052 

U.UD40_o.o052 


06^1+°-°°^^ 

U.UDO±_oo051 


0592+°-'^°'^^ 
u.uoyz_o 0051 


28.0 


-42.0 


34.97 / 35.00 / 34.99 


n n7QQ+0-0073 

U.U/ O0_o,o074 


n ns'^'^+0 0079 

U.UOOO_o,oo77 




42.0 


-65.0 


53.43 / 53.44 / 53.48 








65.0- 


- 100.0 


82.39 / 82.41 / 82.47 






r, -1 09+0.034 
U.10Z_o,o33 



Table 2: Summary of the positron fraction results for the beta-fit, -wavelets- fit and kernel -with -wavelets-fit. The errors are 
defined by the range bet-ween the 16th and the 84th percentiles in the R distributions. 
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Figure 10: The distribution of positively charged particles selected as protons for the 
rigidity bin 28 - 42 GV and the same distribution when modified using the weighted 
bootstrap technique. 



Fig. Hn] shows protons for the rigidity bin 28 - 42 GV and the same 
distribution when modified using the weighted bootstrap technique. The 
range encompassing R — and R + R~ is assumed as an estimate of the 
systematic uncertainty due the inaccuracies in the background selection. Tab. 
[3] reports the systematic uncertainties assesed for each rigidity bin. 
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Rigidity at 


Mean kinetic energy 


Extrapolated 


Systematic 


spectrometer (GV) 


at top of payload 
beta-fit (GeV) 


at top of payload 
with beta-fit 


uncertainties 


1.5 - 1.8 


1.65 





0639+°°°" 

U.UU17 


+0.0010 
-0.0017 


1.8 - 2.2 


1.99 





nrqi +0.0015 

UdJl_o 0015 


+0.0011 
-0.0018 


2.2 - 2.7 


2.44 





nt;f54+0.0015 
UdU^_0 0015 


+0.0012 
-0.0014 


2.7-3.3 


2.99 





UO'±J_o.0016 


+0.0012 
-0.0013 


3.3-4.1 


3.67 





nc;Q7+00017 
-0.0017 


+0.0011 
-0.0013 


4.1 - 5.0 


4.49 





nc4c:+0.0020 
Ud^<J_0 0020 


+0.0018 
-0.0014 


5.0-6.1 


5.68 





nfi02+°°°2^ 

UOUZ_o 0024 


+0.0024 
-0.0015 


6.1 - 7.4 


6.78 





nc;99+0.0024 
U<JZZ_o.oo24 


+0.0024 
-0.0016 


7.4-9.1 


8.27 





0576+°°°^® 

^"-""-0.0028 


+0.0038 
-0.0018 


9.1 - 11.2 


10.16 





nc;7n+0 0033 
'J "J "J -0.0033 


+0.0028 
-0.0019 


11.2 - 15.0 


13.11 





nf:; 1-1+0.0032 
-'^^-0.0033 


+0.0028 
-0.0018 


15.0-20.0 


17.50 





0630+°°°^^ 

UOOU_o.o039 


+0.0033 
-0.0020 


20.0 - 28.0 


23.99 





nc4c:+0.0052 
UO'±0_o 0052 


+0.0045 
-0.0030 


28.0 - 42.0 


34.97 





n7QQ+0.0073 
U ' 'J'J-0.0074 


+0.0057 
-0.0044 


42.0 - 65.0 


53.43 


u.uyu_o 013 


+0.013 
-0.008 


65.0 - 100.0 


82.39 


U.iUD_o.030 


+0.037 
-0.044 



Table 3: Summary of positron fraction results, obtained with the beta-fit, including sta- 
tistical and systematic errors. 



6. Experimental results and conclusions 

Fig [11] shows the positron fraction R obtained trough beta-fit with sta- 
tistical and systematic errors summed in quadrature, compared with the 
PAMELA positron fraction previously reported jij. The solid line shows 



a calculation by Moskalenko & Strong |40| for pure secondary production 
of positrons during the propagation of cosmic-rays in the galaxy. Proton- 
positron discrimination is provided the imaging calorimeter, the capability 
to yield a trustworthy estimate of the positron and electron numbers in the 
cosmic radiation at energies between 1.5 GeV to 100 GeV has been clearly 
established. Compared to what is reported in [ij : a) new experimental data, 
b) the application of three novel background models and c) an estimate of the 
systematic uncertainties has been presented. The new experimental results 
are in agreement with what reported in jij and confirm both solar modu- 
lation effects on cosmic-rays with low rigidities and an anomalous positron 
abundance above 10 GeV. 
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Figure 11: The positron fraction R obtained using a beta- fit with statistical and systematic 
errors summed in quadrature (red), compared with the positron fraction reported in [2| 
(black). The solid line shows a calculation by Moskalenko & Strong 4Q] for pure secondary 
production of positrons during the propagation of cosmic-rays in the galaxy. 
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